Lab 03 - Introduction to Python - external libraries
Introduction to Python - external libraries

1. Activity Identity
| Activity title | Introduction to Robotics |
|---|---|
| Topic | Python / AI / NLP |
| Authors | Institute of Robotics and Machine Intelligence Dominik Belter, Jakub Chudzinski, Marcin Czajka, Kamil Młodzikowski |
| Target learners | Bachelor (Computer Science / IT, Robotics) |
| Estimated duration | 1.5 hour |
| Difficulty level | Beginner |
| FOSSBot environment | Hybrid |
| Licence | CC BY 4.0 |
2. Learning Objectives and Competences
| ID | Learning outcome | Related competences | Assessment evidence |
|---|---|---|---|
| LO1 | Students will be able to install and isolate external Python
libraries using venv and pip. |
Knowledge of Python tooling; selecting programming tools | Screenshot of the working virtual environment after
pip install |
| LO2 | Students will be able to implement a small classical
text-classification pipeline with scikit-learn (TF-IDF +
LogisticRegression). |
Selecting programming tools; using libraries for designing perception components | Working classifier_sklearn.py and its JSON output on
the basic test file |
| LO3 | Students will be able to use a pretrained multilingual model from
sentence-transformers to match text by meaning. |
Using libraries for designing perception components; selecting programming tools | Working classifier_st.py and its JSON output on the
multilingual test file |
3. Prerequisites
A workstation running Linux with a working network connection.
Basic computer literacy: comfortable using a keyboard and mouse, opening applications, capturing screenshots.
Basic Python knowledge: variables, functions, lists, dictionaries,
if/else.
4. Required Material and Setup
| Category | Item | Version / Quantity | Notes |
|---|---|---|---|
| Hardware | Workstation | 1 per student | Any Linux PC. |
| Software | Python 3.10+, pip, venv |
bundled with most Linux distributions | Pre-installed on the lab workstations. |
| Software | git |
bundled with most Linux distributions | Used to clone the starter repository. |
| Dataset / model | paraphrase-multilingual-MiniLM-L12-v2 |
downloaded on first run | Around 120 MB, cached after the first use. Hosted on Hugging Face. |
| Starter code | fossbot-text-to-cmd |
from GitHub | Provides the CLI skeleton, dataset and TODO blocks you
will fill in. |
5. Safety, Ethics and Accessibility Notes
The only risks in this lab are operational:
pip installruns arbitrary code from PyPI. Always inspectrequirements.txtbefore installing and only install from the file provided with the starter.The virtual environment isolates dependencies from your system Python. Do not run
pip installoutside the activated venv unless you know what you are doing.
6. Scenario and Problem Statement
In this lab you will build a small command-line tool that takes a
natural-language command (for example "go forward") and
outputs the corresponding wheel motor speeds as JSON - the kind of
format a low-level robot driver would consume.
You will implement two text-classifiers and compare them:
- A classical machine-learning pipeline built from scratch with
scikit-learn(TF-IDF + LogisticRegression). - A pretrained multilingual sentence-transformer used through similarity matching.
7. Lab Workflow
| Phase | Student action | Expected output | Time |
|---|---|---|---|
| 1. Setup | Create venv, install dependencies | Working environment | 10 min |
| 2. Classical ML | Implement TF-IDF + LogisticRegression in
classifier_sklearn.py |
sklearn classifier passes basic test | 25 min |
| 3. Pretrained AI | Implement similarity matcher in classifier_st.py |
sentence-transformer classifier works | 20 min |
| 4. Experiments | Add the two multilingual runs and inspect the diff | 4 JSON outputs | 10 min |
| 5. Understand | Read the conceptual explanation | Understand the role of features vs classifier | 10 min |
| 6. Bonus (optional) | Drive a physical robot with your classifier | Robot moves on text commands | - |
| 7. Cleanup | Deactivate venv, remove starter directory | Clean /tmp for the next user |
2 min |
| 8. Reflection | Answer the analysis questions | Short answers | 13 min |
8. Step-by-Step Instructions
Step 1 - Environment preparation
💡 Lab workstation credentials. Every workstation in the lab uses the same local account: username
put, passwordlrm.
Log in to your lab workstation.
Open a terminal (
Ctrl+Alt+Ton Ubuntu).Clean up state from any previous lab session. Remove leftover screenshots and any starter directory from a previous run, so your final submission only contains artifacts from this session and
git clonedoes not fail withdestination path already exists:
rm -rf ~/Pictures/Screenshots /tmp/fossbot-text-to-cmd- Clone the starter repository into a fresh directory and enter it:
cd /tmp
git clone https://github.com/LRMPUT/fossbot-text-to-cmd.git
cd fossbot-text-to-cmd💡 Tip: All lab work happens in
/tmprather than in your home directory./tmpis the conventional location for scratch work and is wiped on every reboot, so the workstation stays clean for the next user.
- Create an isolated Python environment with
venvand activate it:
python3 -m venv .venv
source .venv/bin/activateAfter activation your prompt should change to start with
(.venv). From now on, every python and
pip command runs inside this environment, not the system
one.
- Install a CPU-only PyTorch first. By default
pipwould download the CUDA build of PyTorch (~5 GB of NVIDIA libraries). We do not need GPU support in this lab, so pull the much smaller CPU build from PyTorch’s dedicated index:
pip install torch --index-url https://download.pytorch.org/whl/cpu- Install the remaining dependencies declared in
requirements.txt:
pip install -r requirements.txtTogether the two commands download roughly 900 MB of packages. The first install takes a few minutes.
- Verify the install by importing each library:
python -c "import pandas, sklearn, sentence_transformers; print('OK')"Expected result: The terminal prints
OK. If anything fails to import, re-read the
pip install output for an error and re-run the install.
📸 Capture for submission: screenshot the terminal showing the successful
OKline and a prompt that starts with(.venv).
Step 2 - Classical ML classifier
You will now fill in src/classifier_sklearn.py. The
class builds a tiny but complete classical-ML text classifier from two
scikit-learn building blocks:
- TF-IDF (Term Frequency * Inverse Document Frequency) turns each text into a vector of numbers. Every dimension corresponds to one word (or word pair) that appears in the training data, weighted by how often it occurs in that text and how rare it is overall.
- LogisticRegression learns linear decision boundaries between the action classes on top of those TF-IDF vectors and outputs a probability for each class.
Open src/classifier_sklearn.py in any editor and
complete the TODOs (each TODO block in the skeleton also has a direct
link to the relevant function docs):
- TODO 1 - build the pipeline. In
__init__, create a sklearnPipelinewith two named steps - aTfidfVectorizerfirst, then aLogisticRegression:
self.pipeline = Pipeline([
("vec", TfidfVectorizer(lowercase=True, ngram_range=(1, 2))),
("clf", LogisticRegression(max_iter=1000)),
])TODO 2 - load the CSV. Use
pandasto read the file pointed to bycsv_path. The CSV has two columns:textandaction.TODO 3 - train / test split. Use
train_test_splitwithtest_size=0.2andrandom_state=42so the result is reproducible:
x_train, x_test, y_train, y_test = train_test_split(
df["text"], df["action"], test_size=0.2, random_state=42
)TODO 4 - fit the pipeline on the training split.
TODO 5 - print accuracy. Predict on the test set, compute the accuracy with
accuracy_score, andprintthe result:
predictions = self.pipeline.predict(x_test)
print(f"Test accuracy: {accuracy_score(y_test, predictions):.3f}")TODO 6 and 7 - predict on a single text. This one is a small synthesis challenge - the API of
Pipeline.predictandPipeline.predict_probahas a small twist that you need to work around. Implementpredict(self, text)so it returns a(action, confidence)tuple where:actionis the predicted class label as a single string.confidenceis the maximum class probability (a float between 0 and 1).
Hint - things to figure out from the docs
- Both
predict()andpredict_proba()expect a list of texts, not a single string - so wrap the input as[text]. - They both return a list/array even for a single input - take element
[0]to extract the single result. predict_proba()returns a 2D array of shape(n_samples, n_classes). Use.max()on the right row to get the highest probability.
When the file is complete, run the classifier on the basic input file:
python -m src.text_to_wheels \
--input data/examples/basic.txt \
--output outputs/sklearn_basic.json \
--classifier sklearnExpected result: The terminal prints something like
Test accuracy: 0.80 (your exact number may vary because of
the random split) and
Processed 7 commands with the 'sklearn' classifier. Open
outputs/sklearn_basic.json and verify that each command was
mapped to the right action (forward,
turn_left, stop, …).
📸 Capture for submission: screenshot the terminal showing the test accuracy, the success message and a one-line-per-prediction summary of the result (so the screenshot fits on screen):
python -c "import json; [print(f\"{r['input']:22s} -> {r['action']:12s} ({r['confidence']:.3f})\") for r in json.load(open('outputs/sklearn_basic.json'))]"
Experiment with the classical pipeline
Once the classifier passes the basic test, try the following variations one at a time. Re-run after each change and note how the test accuracy changes.
- Change
ngram_rangefrom(1, 2)to(1, 1)in yourTfidfVectorizer. This drops bigrams - the model only sees individual words now. Does accuracy go up, down or stay the same? Think about why that might be for our short English commands.
Expected outcome
For our small dataset the accuracy on the held-out test set actually
goes up slightly with (1, 1) - around 0.93
versus 0.80 with (1, 2) or (1, 3). The English
commands are short and the unigrams are already very informative on
their own; adding bigrams introduces extra TF-IDF features that overfit
the small training set and hurt generalisation on the test set. The
takeaway: more features is not always better - on small datasets they
can add noise rather than signal.
- Change
random_state=42to another value (for example7or0). Runmultilingual.txtagain with the sklearn classifier. Does the same class still win the bias collapse, or did another class take over? What does that tell you about the role ofrandom_state?
Expected outcome
backward only wins the collapse for
random_state=42. With random_state=0,
1, 3, 7 or 100 the
dominant prediction switches to stop (15 of 15 multilingual
inputs are predicted as stop). The bias-winner is an
artefact of how the random shuffle happened to split the classes; it is
not a property of the data, the classifier or the Polish/German/Spanish
inputs. The mechanism (all-zero TF-IDF vector -> only biases matter)
is identical for every seed.
- Optional - bigger change. Append a few Polish
phrases for each action to
data/training_commands.csv(e.g.do przodu,forward). Re-run onmultilingual.txt. Did the sklearn classifier suddenly become multilingual on Polish? Why or why not?
Expected outcome
With three Polish phrases per action added, all five Polish inputs in
multilingual.txt (do przodu,
skręć w lewo, stop, do tyłu,
skręć w prawo) are now classified correctly. The German and
Spanish inputs are almost all still wrong, because their words are still
missing from the TF-IDF vocabulary - they continue to produce all-zero
vectors and collapse to whichever class wins the bias (now
stop after the new training distribution). The lesson is
concrete: TF-IDF cannot generalise beyond the languages and exact words
in its training data. Full multilingual support would require training
examples for every language you care about.
Revert the changes before moving on so the rest of the lab runs against the original setup.
Step 3 - Pretrained multilingual classifier
You will now fill in src/classifier_st.py. This
classifier needs no training data of its own. It uses a
pretrained multilingual model from the
sentence-transformers library that already understands the
meaning of sentences in around 50 languages.
The idea:
- A pretrained model maps any sentence to a fixed-length vector (an embedding). Sentences with similar meaning end up close to each other in the embedding space, regardless of language.
- For each action we keep a small list of reference phrases. These are
defined at the top of
classifier_st.pyas theTEMPLATESdictionary - three English phrases per action. You do not need to modify it:
TEMPLATES = {
"forward": ["go forward", "forward", "ahead"],
"backward": ["go back", "backwards", "reverse"],
"turn_left": ["turn left", "left", "rotate left"],
"turn_right": ["turn right", "right", "rotate right"],
"stop": ["stop", "halt", "stay still"],
}- We pre-encode all 15 template phrases once in
__init__, store the resulting vectors and reuse them on every call. Encoding the templates fresh on everypredict()would be wasteful. - To classify a new input, we encode it and pick the action whose templates are closest to it, measured by cosine similarity.
Open src/classifier_st.py in any editor and complete the
TODOs (each TODO block in the skeleton also has a direct link to the
relevant function docs):
- TODO 1 - load the model. In
__init__, instantiate the pretrained model:
self.model = SentenceTransformer(model_name)The first call downloads the model (around 120 MB) and caches it
under ~/.cache/huggingface/.
- TODO 2 - pre-encode the templates. For each action
in
TEMPLATES, encode its template phrases once and store the resulting matrix inself.template_embeddings:
for action, phrases in TEMPLATES.items():
self.template_embeddings[action] = self.model.encode(phrases)- TODO 3, 4, 5 - predict. The main coding challenge.
Implement
predict(self, text)so it returns a(action, similarity)tuple whereactionis the action whose templates are most similar to the input andsimilarityis the cosine similarity of that best match.
Hint - composition pattern
- Encode the input with
self.model.encode([text])(wrap in a list - the API expects a list and returns a 2D array). - Loop over
self.template_embeddings.items(). For each action, computecosine_similarity(user_vec, embeddings)and take.max()as that action’s score. - Track the best action and best score across the loop, and return them as a tuple at the end.
When the file is complete, run the classifier on the same basic input:
python -m src.text_to_wheels \
--input data/examples/basic.txt \
--output outputs/st_basic.json \
--classifier stExpected result: The terminal prints
Processed 7 commands with the 'st' classifier. and
outputs/st_basic.json looks similar to the sklearn output -
every command was mapped to the right action.
📸 Capture for submission: screenshot the terminal showing the success message and the compact summary of the result:
python -c "import json; [print(f\"{r['input']:22s} -> {r['action']:12s} ({r['confidence']:.3f})\") for r in json.load(open('outputs/st_basic.json'))]"
Experiment with the sentence-transformer
Try the following one at a time and observe what changes:
- Reduce the templates. Keep only one phrase per
action in
TEMPLATES(delete the other two). Re-run onmultilingual.txt. Does accuracy hold up, or do some inputs now miss? Why might that be?
Expected outcome
Accuracy holds up - on the standard multilingual.txt you
still get 15 out of 15 correct even with a single template per action.
The pretrained model already understands meaning, so a single anchor
phrase per action is enough; the multilingual capability comes from the
model itself, not from the number of templates. Extra phrases mainly
help on tricky or ambiguous inputs that are equally close to several
actions - they widen the “catchment area” without changing the language
coverage.
- Test your own languages. Append 2-3 phrases in
another language you know (Italian, French, Russian, …) to
data/examples/multilingual.txt. Re-run and check the predictions. Did the model handle the new languages?
Expected outcome
The model typically handles them well -
paraphrase-multilingual-MiniLM-L12-v2 was pretrained on
around 50 languages, so Italian (“avanti”, “indietro”), French
(“avance”, “arrête”), Russian (“вперёд”, “стой”), Czech (“dopředu”) and
Portuguese (“para frente”) all map to the right action without any extra
work. The occasional miss happens on rarer words, but the coverage is
impressive for a free 22 MB model that you did not have to “tell” about
any of those languages.
- Swap the model. Change
model_nameto"all-MiniLM-L6-v2"(an English-only sibling of our multilingual model). Re-run on bothbasic.txtandmultilingual.txt. What happens? What does this tell you about where the multilingual capability lives?
Expected outcome
On basic.txt (English inputs) the English-only model
still works perfectly - it knows English and the templates are English.
But multilingual.txt collapses to roughly 4 out of 15: a
couple of Polish words happen to be close enough to English ones for the
model to recover, but German and Spanish drop almost entirely. The
lesson: the multilingual behaviour lives in the pretrained
model, not in our templates or in the classifier logic on top.
The model AND the inputs need to share a common embedding space - when
you swap to a model that only learned English, only English inputs
continue to work.
Revert the changes before moving on.
Step 4 - Compare the two approaches
You have two working classifiers and two example files. Now run all four combinations and look at the differences:
# Sklearn on basic.txt (already done in Step 2)
python -m src.text_to_wheels \
--input data/examples/basic.txt \
--output outputs/sklearn_basic.json \
--classifier sklearn
# Sklearn on multilingual
python -m src.text_to_wheels \
--input data/examples/multilingual.txt \
--output outputs/sklearn_multilingual.json \
--classifier sklearn
# Sentence transformer on basic.txt (already done in Step 3)
python -m src.text_to_wheels \
--input data/examples/basic.txt \
--output outputs/st_basic.json \
--classifier st
# Sentence transformer on multilingual
python -m src.text_to_wheels \
--input data/examples/multilingual.txt \
--output outputs/st_multilingual.json \
--classifier stCompare the four output files. The expected pattern is:
| Input | Classifier | Expected accuracy |
|---|---|---|
basic.txt |
sklearn | high (~7/7) |
basic.txt |
st | high (~7/7) |
multilingual.txt |
sklearn | low (~2/15) |
multilingual.txt |
st | high (~14-15/15) |
Look at outputs/sklearn_multilingual.json - the sklearn
classifier collapses to almost always predicting the same action with
very low confidence. Compare with
outputs/st_multilingual.json, where Polish, German and
Spanish commands all map to the correct action.
📸 Capture for submission: screenshot the compact summaries of both multilingual outputs printed back to back. Run the one-liner once per file:
echo "=== sklearn ===" && python -c "import json; [print(f\"{r['input']:22s} -> {r['action']:12s} ({r['confidence']:.3f})\") for r in json.load(open('outputs/sklearn_multilingual.json'))]" && echo "=== st ===" && python -c "import json; [print(f\"{r['input']:22s} -> {r['action']:12s} ({r['confidence']:.3f})\") for r in json.load(open('outputs/st_multilingual.json'))]"
Step 5 - Why these approaches differ on multilingual input
Every ML classifier operates on numbers. Text must first be turned into a vector. The whole difference between our two approaches is how that conversion happens, not which algorithm consumes the result afterwards.
How sklearn TF-IDF represents text
Step 1 - build a vocabulary.
TfidfVectorizer reads all 75 examples from
training_commands.csv and builds a dictionary of unique
tokens. The vocabulary might look like:
vocab = ["go", "forward", "ahead", "turn", "left", "right", "back", "stop", ...]
(say a few hundred unique tokens)
Step 2 - one vector per text. Each text becomes a
vector with one dimension per vocabulary token (so a few hundred
dimensions for our vocabulary). For "go forward":
[1, 1, 0, 0, 0, 0, 0, 0, ...]
^ ^
| └── "forward" is present
└───── "go" is present
all other positions are zero
(In practice the values are not 1 or 0 but TF * IDF weights; the principle is the same.)
Step 3 - the classifier learns.
LogisticRegression learns rules like “when the positions
for go and forward are set, the answer is
forward”.
What happens with "fahre vorwärts"?
TF-IDF looks up "fahre" in the vocabulary -> NOT THERE (never seen)
TF-IDF looks up "vorwärts" in the vocabulary -> NOT THERE (never seen)
Vector: [0, 0, 0, 0, 0, 0, ..., 0] (all zeros)
The classifier receives an all-zero vector. There is no signal to act
on, so it always predicts the same class. Which one? LogisticRegression
has learned a per-class bias - a small built-in
tendency that says, in effect, “when nothing else is informative, this
is my default guess for this class”. The class with the highest bias
wins. Biases are not arbitrary; they were adjusted during training and
roughly reflect how often each action appeared in the training split.
With our exact data and random_state=42, the
backward class ended up with the highest bias, so it wins
every tie - that is the
backward-with-confidence-around-0.25 collapse you saw in Step 4. The exact
confidence value may differ by a few thousandths between scikit-learn
versions, but the winning class is reproducible. With a different random
seed a different class would win (in fact stop wins for
most other seeds), but the mechanism is the same.
Key idea: TF-IDF only knows the words it saw during training. This is lexical matching (letter matching), not semantic.
How sentence-transformers represents text
A sentence-transformer is a model that has already been
trained, in our case
paraphrase-multilingual-MiniLM-L12-v2. It was trained on
billions of sentences in around 50 languages. During that pretraining it
learned a crucial property: sentences with similar
meaning get similar vectors - regardless of
the language they are written in.
Step 1 - no vocabulary from our data. The model simply uses what it learned during pretraining.
Step 2 - each text becomes a fixed-length dense
vector (a vector where every dimension carries a meaningful
value, in contrast to TF-IDF where most are zero) that represents its
meaning rather than the literal words (tokens) that
make it up. The length of that vector is a property of the model: our
paraphrase-multilingual-MiniLM-L12-v2 always produces 384
numbers. For illustration:
"go forward" -> [ 0.20, -0.41, 0.56, ..., -0.11] (384 numbers)
"do przodu" -> [ 0.21, -0.43, 0.55, ..., -0.12] very close
"fahre vorwärts" -> [ 0.22, -0.42, 0.54, ..., -0.13] also close
"adelante" -> [ 0.19, -0.40, 0.57, ..., -0.10] also close
"stop" -> [-0.15, 0.71, -0.22, ..., 0.34] FAR from the four above
"halt" -> [-0.16, 0.70, -0.23, ..., 0.33] close to "stop"
# Note: a single number (e.g. 0.21) does not represent a word or a single
# concept. The 384 dimensions together form an abstract coordinate;
# meaning is encoded by the position of the whole vector in this space -
# sentences with similar meaning end up close to each other.
In this 384-dimensional embedding space, sentences cluster by
meaning - English, Polish, German and Spanish forward-commands
all land in the same neighbourhood, while "stop" sits
elsewhere.
Step 3 - cosine similarity measures the angle between vectors. Close vectors -> similarity close to 1; far vectors -> close to 0.
What happens with "fahre vorwärts"?
1. The model encodes "fahre vorwärts" -> [0.22, -0.42, 0.54, ...]
(the model understands the meaning - it IS a forward command, in German)
2. Compare to the template embeddings of every action:
- similarity with "go forward" = 0.93 <-- VERY HIGH
- similarity with "go back" = 0.21
- similarity with "turn left" = 0.18
- similarity with "turn right" = 0.17
- similarity with "stop" = 0.12
3. Highest similarity wins -> action = "forward"
This is semantic matching. The model never saw our
specific templates during pretraining, but it learned on billions of
sentence pairs that German "fahre vorwärts" and English
"go forward" mean the same thing.
Side-by-side comparison
| Aspect | TF-IDF (sklearn) | Sentence Transformer |
|---|---|---|
| Text representation | Sparse vector indexed by training-vocabulary tokens | Fixed-length dense vector representing meaning |
| Where does word knowledge come from? | Only from our 75-row training CSV | From pretraining on billions of multilingual sentences |
| Unseen word | Ignored (contributes 0) | Mapped to a meaningful position in the embedding space - the model breaks unknown words into sub-word units it has seen, so even brand-new words can be located near other texts with similar meaning |
| Languages | Only the language of the training data | Multilingual out of the box |
| What is “knowledge”? | Word statistics from our 75 examples | Meaning patterns learned from billions of sentences in many languages |
The takeaway
The choice of classifier on top (LogisticRegression vs
cosine similarity) is not what gives multilingual
capability. The representation of the text - the
features the classifier sees - is what makes the
difference.
TF-IDF features only know the words from our training data. Pretrained sentence embeddings carry knowledge from a massive multilingual corpus. The same lesson applies across modern ML: it is often the features (and how you obtain them, for example by pretraining), not the algorithm, that decides whether a system works.
Step 6 - Optional bonus: drive a physical robot
If you have access to a robot, wire your classifier to a low-level wheel-control API to see the full text -> wheels -> motion pipeline live. A minimal sketch:
- Read a command from
input()or a file. - Call your
predict()to obtain an action label. - Look up the wheel speeds in
WHEEL_COMMANDS(insrc/wheel_mapping.py). - Pass those speeds to your robot driver (for example the one you built in Lab 2) and watch the robot move.
Step 7 - Cleanup
When you have collected all the submission artefacts, leave the workstation in a clean state for the next user. From any directory:
deactivate 2>/dev/null; cd ~ && rm -rf /tmp/fossbot-text-to-cmddeactivate exits the virtual environment (the leading
2>/dev/null silences the message if the venv was already
inactive), cd ~ steps out of the starter directory so it
can be removed, and rm -rf deletes the starter together
with the .venv and all its installed packages.
The Hugging Face model cache under ~/.cache/huggingface/
can be left in place - it speeds up the next session and does not
contain any session-specific state.
Expected result:
ls /tmp/fossbot-text-to-cmd reports
No such file or directory.
9. Analysis Questions
Look at the confidences in
/tmp/fossbot-text-to-cmd/outputs/sklearn_multilingual.json. Almost every input ended up with the same prediction and the same confidence value (around 0.25). Explain why this happens.Look at the templates dictionary in
/tmp/fossbot-text-to-cmd/src/classifier_st.py. There are only 3 English phrases per action - no Polish, German or Spanish. Why is the sentence-transformer classifier still able to handle multilingual input correctly?The
wheelsfield in the JSON output uses values in[-1.0, 1.0]. The mapping from action name to wheel speeds is defined in/tmp/fossbot-text-to-cmd/src/wheel_mapping.py. What would you change there to make the robot turn faster on the spot?The sklearn pipeline is trained from scratch on 75 examples; the sentence-transformer model is loaded already trained. List one advantage and one disadvantage of each approach for a project that needs to recognise 50 different commands instead of 5.
After attempting it yourself, you may review the suggested answer
sklearn
- Advantage: deterministic, lightweight and fast - sub-millisecond inference, model file under a megabyte, easy to ship and reproduce.
- Disadvantage: the labelled dataset has to grow roughly linearly with the number of commands - 50 commands need around 15-30 examples per class, i.e. 750-1500 labelled phrases to collect and maintain. The model is also brittle on paraphrases or any other words it never saw during training, and it works in a single language only.
Sentence transformer
- Advantage: multilingual out of the box and robust to paraphrases - you only need 2-3 reference templates per command (about 150 phrases for 50 commands instead of 1500), no labelled dataset to collect.
- Disadvantage: heavier runtime (dependencies and model weights on the order of hundreds of MB, ~100ms inference per query), less interpretable when it makes mistakes, and improving accuracy on specialised commands typically requires fine-tuning the model - which needs a GPU, thousands of paired sentences and significant compute time.
10. Submission Requirements
A screenshot of the working virtual environment from Step 1 (terminal shows the
OKline and the(.venv)prompt).A screenshot of the sklearn test accuracy and
outputs/sklearn_basic.jsonfrom Step 2.A screenshot of
outputs/st_basic.jsonfrom Step 3.A screenshot or diff comparing
outputs/sklearn_multilingual.jsonandoutputs/st_multilingual.jsonfrom Step 4.Short answers (2-3 sentences each) to the four analysis questions.
11. References and Open Licence
scikit-learndocumentation - https://scikit-learn.org/stable/pandasdocumentation - https://pandas.pydata.org/docs/sentence-transformersdocumentation - https://www.sbert.net/- Hugging Face model card for
paraphrase-multilingual-MiniLM-L12-v2- https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
Direct links to the specific functions used in this lab are in the
TODO comments of the skeleton files
(src/classifier_sklearn.py,
src/classifier_st.py).
The Creative Commons Attribution 4.0 International (CC BY 4.0) license allows users to share, copy, distribute, and adapt the work, even for commercial purposes, as long as proper credit is given to the original creator.
EU funding disclaimer
Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.